The predictive power of data: machine learning analysis for Covid-19 mortality based on personal, clinical, preclinical, and laboratory variables in a case–control study

Background and purpose The COVID-19 pandemic has presented unprecedented public health challenges worldwide. Understanding the factors contributing to COVID-19 mortality is critical for effective management and intervention strategies. This study aims to unlock the predictive power of data collected from personal, clinical, preclinical, and laboratory variables through machine learning (ML) analyses. Methods A retrospective study was conducted in 2022 in a large hospital in Abadan, Iran. Data were collected and categorized into demographic, clinical, comorbid, treatment, initial vital signs, symptoms, and laboratory test groups. The collected data were subjected to ML analysis to identify predictive factors associated with COVID-19 mortality. Five algorithms were used to analyze the data set and derive the latent predictive power of the variables by the shapely additive explanation values. Results Results highlight key factors associated with COVID-19 mortality, including age, comorbidities (hypertension, diabetes), specific treatments (antibiotics, remdesivir, favipiravir, vitamin zinc), and clinical indicators (heart rate, respiratory rate, temperature). Notably, specific symptoms (productive cough, dyspnea, delirium) and laboratory values (D-dimer, ESR) also play a critical role in predicting outcomes. This study highlights the importance of feature selection and the impact of data quantity and quality on model performance. Conclusion This study highlights the potential of ML analysis to improve the accuracy of COVID-19 mortality prediction and emphasizes the need for a comprehensive approach that considers multiple feature categories. It highlights the critical role of data quality and quantity in improving model performance and contributes to our understanding of the multifaceted factors that influence COVID-19 outcomes. Supplementary Information The online version contains supplementary material available at 10.1186/s12879-024-09298-w.


Introduction
The World Health Organization (WHO) has declared COVID-19 a global pandemic in March 2020 [1].The first cases of SARSCoV-2, a new severe acute respiratory syndrome coronavirus, were detected in Wuhan, China, and rapidly spread to become a global public health problem [2].The clinical presentation and symptoms of COVID-19 may be similar to those of Middle East Respiratory Syndrome (MERS) and Severe Acute Respiratory Syndrome (SARS), however the rate of spread is higher [3].By December 31, 2022, the pandemic had caused more than 729 million cases and nearly 6.7 million deaths (0.92%) were confirmed in 219 countries worldwide [4].For many countries, figuring out what measures to take to prevent death or serious illness is a major challenge.Due to the complexity of transmission and the lack of proven treatments, COVID-19 is a major challenge worldwide [5,6].In middle-and low-income countries, the situation is even more catastrophic due to high illiteracy rates, a very poor health care system, and lack of intensive care units [5].In addition, understanding the factors contributing to COVID-19 mortality is critical for effective management and intervention strategies [6].
Accurate diagnosis and treatment of the disease requires a comprehensive assessment that considers a variety of factors.These factors include personal factors such as medical history, lifestyle, and genetics; clinical factors such as observations on physical examinations and physician reports; preclinical factors such as early detection through screening or surveillance; laboratory factors such as results of diagnostic tests and medical imaging; and patient-reported signs and symptoms.However, the variety of characteristics associated with COVID-19 makes it difficult for physicians to accurately classify COVID-19 patients during the pandemic.
In today's digital transformation era, machine learning plays a vital role in various industries, including healthcare, where substantial data is generated daily [19][20][21].Numerous studies have explored machine learning (ML) and explainable artificial intelligence (AI) in predicting COVID-19 prognosis and diagnosis [22][23][24][25].Chadaga et al. have developed decision support systems and triage prediction systems using clinical markers and biomarkers [22,23].Similarly, Khanna et al. have developed a ML and explainable AI system for COVID-19 triage prediction [24].Zoabi has also made contributions in this field, developing ML models that predict COVID-19 test results with high accuracy based on a small number of features such as gender, age, contact with an infected person and initial clinical symptoms [25].These studies emphasize the potential of ML and explainable AI to improve COVID-19 prediction and diagnosis.Nonetheless, the efficacy of ML algorithms heavily relies on the quality and quantity of data utilized for training.Recent research has indicated that deep learning algorithms' performance can be significantly enhanced compared to traditional ML methods by increasing the volume of data used [26].However, it is crucial to acknowledge that the impact of data volume on model performance can vary based on data characteristics and experimental setup, highlighting the need for careful consideration and analysis when selecting data for model training.While the studies emphasize the importance of features in training ML algorithms for COVID-19 prediction and diagnosis, additional research is required on methods to enhance the interpretability of features.
Therefore, the primary aim of this study is to identify the key factors associated with mortality in COVID -19 patients admitted to hospitals in Abadan, Iran.For this purpose, seven categories of factors were selected, including demographic, clinical and conditions, comorbidities, treatments, initial vital signs, symptoms, and laboratory tests, and machine learning algorithms were employed.The predictive power of the data was assessed using 139 predictor variables across seven feature sets.Our next goal is to improve the interpretability of the extracted important features.To achieve this goal, we will utilize the innovative SHAP analysis, which illustrates the impact of features through a diagram.

Study population and data collection
Using data from the COVID-19 hospital-based registry database, a retrospective study was conducted from April 2020 to December 2022 at Ayatollah Talleghani Hospital (a COVID-19 referral center) in Abadan City, Iran.
A total of 14,938 patients were initially screened for eligibility for the study.Of these, 9509 patients were excluded because their transcriptase polymerase chain reaction (RT-PCR) test results were negative or unspecified.The exclusion of patients due to incomplete or missing data is a common issue in medical research, particularly in the use of electronic medical records (EMRs) [27].In addition, 1623 patients were excluded because their medical records contained more than 70% incomplete or missing data.In addition, patients younger than 18 years were not included in the study.The criterion for excluding 1623 patients due to "70% incomplete or missing data" means that the medical records of these patients did not contain at least 30% of the data required for a meaningful analysis.This threshold was set to ensure that the dataset used for the study contained a sufficient amount of complete and reliable information to draw accurate conclusions.Incomplete or missing data in a medical record may relate to key variables such as patient demographics, symptoms, lab results, treatment information, outcomes, or other data points important to the research.Insufficient data can affect the validity and reliability of study results and lead to potential bias or inaccuracies in the findings.It is important to exclude such incomplete records to maintain the quality and integrity of the research findings and to ensure that the conclusions drawn are based on robust and reliable data.After these exclusions, 3806 patients remained.Of these patients, 474 died due to COVID -19, while the remaining 3332 patients recovered and were included in the control group.To obtain a balanced sample, the control group was selected with a propensity score matching (PSM).The PSM refers to a statistical technique used to create a balanced comparison group by matching individuals in the control group (in this case, the survived group) with individuals in the case group (in this case, the deceased group) based on their propensity scores.In this study, the propensity scores for each person represented the probability of death (coded as a binary outcome; survived = 0, deceased = 1) calculated from a set of covariates (demographic factors) using the matchit function from the MatchIt library.Two individuals, one from the deceased group and one from the survived group, are considered matched if the difference between their propensity scores is small.Non-matching participants are discarded.The matching aims to reduce bias by making the distribution of observed characteristics similar between groups, which ultimately improves the comparability of groups in observational studies [28].In total, the study included 1063 COVID-19 patients who belonged to either the deceased group (case = 474) or the survived group (control = 589) (Fig. 1).
In the COVID-19 hospital-based registry database, one hundred forty primary features in eight main classes including patient's demographics (eight features), clinical and conditions features (16 features), comorbidities Fig. 1 Flowchart describing the process of patient selection (18 features), treatment (17 features), initial vital sign (14 features), symptoms during hospitalization (31 features), laboratory results (35 features), and an output (0 for survived and 1 for deceased) was recorded for COVID-19 patients.The main features included in the hospital-based COVID-19 registry database are provided in Appendix Table 1.
To ensure the accuracy of the recorded information, discharged patients or their relatives were called and asked to review some of the recorded information (demographic information, symptoms, and medical history).Clinical symptoms and vital signs were referenced to the first day of hospitalization (at admission).Laboratory test results were also referenced to the patient's first blood sample at the time of hospitalization.
The study analyzed 140 variables in patients' records, normalizing continuous variables and creating a binary feature to categorize patients based on outcomes.To address the issue of an imbalanced dataset, the Synthetic Minority Over-sampling Technique (SMOTE) was utilized.Some classes were combined to simplify variables.
For missing data, an imputation technique was applied, assuming a random distribution [29].Little's MCAR test was performed with the naniar package to assess whether missing data in a dataset is missing completely at random (MCAR) [30].The null hypothesis in this test is that the data are MCAR, and the test statistic is a chisquare value.
The Ethics Committee of Abadan University of Medical Science approved the research protocol (No.IR.ABADANUMS.REC.1401.095).

Predictor variables
All data were collected in eight categories, including demographic, clinical and conditions, comorbidities, treatment, initial vital signs, symptoms, and laboratory tests in medical records, for a total of 140 variables.
The "Demographics" category encompasses eight features, three of which are binary variables and five of which are categorical.The "Clinical Conditions" category includes 16 features, comprising one quantitative variable, 12 binary variables, and five categorical features.1).

Outcome variable
The primary outcome variable was mortality, with December 31, 2022, as the last date of follow-up.The feature shows the class variable, which is binary.For any patient in the survivor group, the outcome is 0; otherwise, it is 1.In this study, 44.59% (n = 474) of the samples were in the deceased group and were labeled 1.

Data balancing
In case-control studies, it is common to have unequal size groups since cases are typically fewer than controls [31].However, in case-control studies with equal sizes, data balancing may not be necessary for ML algorithms [32].When using ML algorithms, data balancing is generally important when there is an imbalance between classes, i.e., when one class has significantly fewer observations than the other [33].In such cases, balancing can improve the performance of the algorithm by reducing the bias in favor of the majority class [34].For case-control studies of the same size, the balance of the classes has already been reached and balancing may not be necessary.However, it is always recommended to evaluate the performance of the ML algorithm with the given data set to determine the need for data balancing.This is because unbalanced case-control ratios can cause inflated type I error rates and deflated type I error rates in balanced studies [35].

Feature selection
Feature selection is about selecting important variables from a large dataset to be used in a ML model to achieve better performance and efficiency.Another goal of feature selection is to reduce computational effort by eliminating irrelevant or redundant features [36,37].Before generating predictions, it is important to perform feature selection to improve the accuracy of clinical decisions and reduce errors [37].To identify the best predictors, researchers often compare the effectiveness of different feature selection methods.In this study, we used five common methods, including Decision Tree (DT), eXtreme Gradient Boosting (XGBoost), Support Vector Machine (SVM), Naïve Bayes (NB), and Random Forest (RF), to select relevant features for predicting mortality of COVID -19 patients.To avoid overfitting, we performed ten-fold cross-validation when training our dataset.This approach may help ensure that our model is optimized for accurate predictions of health status in COVID -19 patients.

Model development, evaluation, and clarity
In this study, the predictive models were developed with five ML algorithms, including DT, XGBoost, SVM, NB, and RF, using the R programming language (v4.3.1) and its packages [38].We used cross-validation (CV) to tune the hyperparameters of our models based on the training subset of the dataset.For training and evaluating our ML models, we used a common technique called tenfold cross validation [39].The primary training dataset was divided into ten folding, each containing 10% of the total data, using a technique called stratified random sampling.For each of the 30% of the data, a ML model was built and trained on the remaining 70% of the data.
The performance of the model was then evaluated on the 30%-fold sample.This process was repeated 100 times with different training and test combinations, and the average performance was reported.Performance measures include sensitivity (recall), specificity, accuracy, F1-score, and the area under the receiver operating characteristics curve (AUC ROC).Sensitivity is defined as TP / (TP + FN), whereas specificity is TN / (TN + FP).F1-score is defined as the harmonic mean of Precision and Recall with equal weight, where Precision equals TP + TN / total.Also, AUC refers to the area under the ROC curve.In the evaluation of ML techniques, values were classified as poor if below 50%, ok if between 50 and 80%, good if between 80 and 90%, and very good if greater than 90%.These criteria are commonly used in reporting model evaluations [40,41].
Finally, the shapely additive explanation (SHAP) method was used to provide clarity and understanding of the models.SHAP uses cooperative game theory to determine how each feature contributes to the prediction of ML models.This approach allows the computation of the contribution of each feature to model performance [42,43].For this purpose, the package shapr was used, which includes a modified iteration of the kernel SHAP approach that takes into account the interdependence of the features when computing the Shapley values [44].

Patient characteristics
Table 1 shows the baseline characteristics of patients infected with COVID-19, including demographic data such as age and sex and other factors such as occupation, place of residence, marital status, education level, BMI, and season of admission.A total of 1063 adult patients (≥ 18 years) were enrolled in the study, of whom 589 (55.41%) survived and 474 (44.59%) died.Analysis showed that age was significantly different between the two groups, with a mean age of 54.70 ± 15.60 in the survivor group versus 65.53 ± 15.18 in the deceased group (P < 0.001).There was also a significant association between age and survival, with a higher proportion of patients aged < 40 years in the survivor group (77.0%) than in the deceased group (23.0%) (P < 0.001).No significant differences were found between the two groups in terms of sex, occupation, place of residence, marital status, and time of admission.However, there was a significant association between educational level and survival, with a lower proportion of patients with a college degree in the deceased group (37.2%) than in the survivor group (62.8%) (P = 0.017).BMI also differed significantly between the two groups, with the proportion of patients with a BMI > 30 (kg/cm 2 ) being higher in the deceased group (56.5%) than in the survivor group (43.5%) (P < 0.001).

Clinical and conditions
Important insights into the various clinical and condition characteristics associated with COVID-19 infection outcomes provides in Table 2.The results show that patients who survived the infection had a significantly shorter hospitalization time (2.20 ± 1.63 days) compared to those who died (4.05 ± 3.10 days) (P < 0.001).Patients who were admitted as elective cases had a higher survival rate (84.6%) compared to those who were admitted as urgent (61.3%) or emergency (47.4%) cases.There were no significant differences with regard to the number of infections or family infection history.However, patients who had a history of travel had a lower decease rate (40.1%).
A significantly higher proportion of deceased patients had cases requiring CPR (54.7% vs. 45.3%).Patients who had underlying medical conditions had a significantly lower survival rate (38.3%), with hyperlipidemia being the most prevalent condition (18.7%).Patients who had a history of alcohol consumption (12.5%), transplantation (30.0%), chemotropic (21.4%) or special drug use (0.0%), and immunosuppressive drug use (30.0%) also had a lower survival rate.Pregnant patients (44.4%) had similar survival outcomes compared to non-pregnant patients (55.6%).Patients who were recent or current smokers (36.4%) also had a significantly lower survival rate.The P-values reported in the table show that some symptoms are significantly associated with death, including productive cough, dyspnea, sore throat, headache, delirium, olfactory symptoms, dyspepsia, nausea, vomiting, sepsis, respiratory failure, heart failure, MODS, coagulopathy, secondary infection, stroke, acidosis, and admission to the intensive care unit.Surviving and deceased patients also differed significantly in the average number of days spent in the ICU.There was no significant association between patient outcomes and symptoms such as nonproductive cough, chills, diarrhea, chest pain, and hyperglycemia.

Laboratory tests
Table 7 shows the laboratory values of COVID-19 patients with the average values of the different laboratory results.The results show that the deceased patients had significantly lower levels of red blood cells (3.78 × 106/µL vs. 5.01 × 106/µL), hemoglobin (11.22 g/ dL vs. 14.10 g/dL), and hematocrit (34.10% vs. 42.46%),whereas basophils and white blood cells did not differ  Other laboratory values with statistically significant differences between the two groups (P < 0.001) were INR, ESR, BUN, Cr, Na, K, P, PLT, TSH, T3, and T4.The surviving patients generally had lower values in these laboratory characteristics than the deceased patients.

Model performance and evaluation
Five ML algorithms, namely DT, XGBoost, SVM, NB, and RF, were used in this study to build mortality prediction models COVID -19.The models were based on the optimal feature set selected in a previous step and were trained on the same data set.The effectiveness of the models was evaluated by calculating sensitivity, specificity, accuracy, F1 score, and AUC metrics.Table 8 shows the results of this performance evaluation.The average values are expressed from the test set as the mean (standard deviation).
The results show that the performance of the models varies widely in the different feature categories.The Laboratory Tests category achieved the highest performance, with all models scoring 100% in all metrics.The Symptoms and initial Vital Signs categories also show high performance, with XGBoost achieving the highest accuracy of 98.03% and DT achieving the highest sensitivity of 92.79%.
The Clinical and Conditions category also showed high performance, with all models showing accuracy above 91%.XGBoost achieved the highest sensitivity and specificity of 92.74% and 92.96%, respectively.In contrast, the Demographics category showed the lowest performance, with all models achieving less than 66.5% accuracy.
In summary, the results suggest that certain feature categories may be more useful than others in predicting mortality from COVID-19 and that some ML models may perform better than others depending on the feature category used.

Feature importance
SHapley Additive exPlanations (SHAP) values indicate the importance or contribution of each feature in predicting model output.These values help to understand the influence and importance of each feature on the model's decision-making process.
In Fig. 2, the mean absolute SHAP values are shown to depict global feature importance.Figure 2 shows the contribution of each feature within its respective group as calculated by the XGBoost prediction model using SHAP.According to the SHAP method, the features that had the greatest impact on predicting COVID-19 mortality were, in descending order: D-dimer, CPR, PEEP, underlying disease, ESR, antifungal treatment, PaO2, age, dyspnea, and nausea.
On the other hand, Fig. 3 presents the local explanation summary that indicates the direction of the relationship between a variable and COVID-19 outcome.As shown in Fig. 3(I to VII), older age and very low BMI were the two demographic factors with the greatest impact on model outcome, followed by clinical factors such as higher CPR, hospitalization, and hyperlipidemia.Higher mortality rates were associated with patients who smoked and had traveled in the past 14 days.Patients with underlying diseases, especially HTN, died more frequently.In contrast, the use of remdesivir, Vit Zn, and favipiravir is associated with lower mortality.Initial vital signs such as high PEEP, low PaO2 and RR had the greatest impact, as did symptoms such as dyspnea, MODS, sore throat and LOC.A higher risk of mortality is observed in patients with higher D-dimer levels and ESR as the most consequential laboratory tests, followed by K, AST and CPK-MB.
Using the feature types listed in Appendix Table 1, Fig. 4 shows that the performance of ML algorithms can be improved by increasing the number of features used in training, especially in distinguishing between symptoms, comorbidities, and treatments.In addition, the amount and quality of data used for training can significantly affect algorithm performance, with laboratory tests being more informative than initial vital signs.Regarding the influence of features, quantitative features tend to have a more positive effect on performance than qualitative features; clinical conditions tend to be more informative than demographic data.Thus, both the amount of data and the type of features used have a significant impact on the performance of ML algorithms.

Discussion
The COVID-19 pandemic has presented unprecedented public health challenges worldwide and requires a deep understanding of the factors contributing to COVID-19 mortality to enable effective management and intervention.This study used machine learning analysis to uncover the predictive power of an extensive dataset that includes wide range of personal, clinical, preclinical, and laboratory variables associated with COVID-19 mortality.
This study confirms previous research on COVID-19 outcomes that highlighted age as a significant predictor of mortality [45][46][47], along with comorbidities such as hypertension and diabetes [48,49].Underlying conditions such as cardiovascular and renal disease also contribute to mortality risk [50,51].
Initial vital signs such as heart rate, respiratory rate, temperature, and oxygen therapy differ between   surviving and deceased patients [55].Deceased patients often have increased heart rate, lower respiratory rate, higher temperature, and increased oxygen requirements, which can serve as early indicators of disease severity.Symptoms such as productive cough, dyspnea, and delirium are significantly associated with COVID-19 mortality, emphasizing the need for immediate monitoring and intervention [56].Laboratory tests show altered hematologic and biochemical markers in deceased patients, underscoring the importance of routine laboratory monitoring in COVID-19 patients [57,58].
The ML algorithms were used in the study to predict mortality COVID-19 based on these multilayered variables.XGBoost and Random Forest performed better than other algorithms and had high recall, specificity, accuracy, F1 score, and AUC.This highlights the potential of ML, particularly the XGBoost algorithm, in improving prediction accuracy for COVID-19 mortality [59].The study also highlighted the importance of drug choice in treatment and the potential of ML algorithms, particularly XGBoost, in improving prediction accuracy.However, the study's findings differ from those of Moulaei [60], Nopour [61], and Mehraeen [62] in terms of the best-performing ML algorithm and the most influential variables.While Moulaei [60] found that the random forest algorithm had the best performance, Nopour [61] and Ikemura [63] identified the artificial neural network and stacked ensemble models, respectively, as the most effective.Additionally, the most influential variables in predicting mortality varied across the studies, with Moulaei [60] highlighting dyspnea, ICU admission, and oxygen therapy, and Ikemura [63] identifying systolic and diastolic blood pressure, age, and other biomarkers.These differences may be attributed to variations in the datasets, feature selection, and model training.
However, it is important to note that the choice of algorithm should be tailored to the specific dataset and research question.In addition, the results suggest that a comprehensive approach that incorporates different feature categories may lead to more accurate prediction of COVID-19 mortality.In general, the results suggest that the performance of ML models is influenced by the number and type of features in each category.While some models consistently perform well across different categories (e.g., XGBoost), others perform better for specific types of features (e.g., SVM for Demographics).
Analysis of the importance of characteristics using SHAP values revealed critical factors affecting model results.D-dimer values, CPR, PEEP, underlying diseases, and ESR emerged as the most important features, highlighting the importance of these variables in predicting COVID-19 mortality.These results provide valuable insights into the underlying mechanisms and risk factors associated with severe COVID-19 outcomes.
The types of features used in ML models fall into two broad categories: quantitative (numerical) and qualitative (binary or categorical).The performance of ML methods can vary depending on the type of features used.Some algorithms work better with quantitative features, while others work better with qualitative features.For example, decision trees and random forests work well with both types of features [64], while neural networks often work better with quantitative features [65,66].Accordingly, we consider these levels for the features under study to better assess the impact of the data.
The success of ML algorithms depends largely on the quality and quantity of the data on which they are trained [67][68][69].Recent research, including the 2021 study by Sarker IH. [26], has shown that a larger amount of data can significantly improve the performance of deep learning algorithms compared to traditional machine learning techniques.However, it should be noted that the effect of data size on model performance depends on several factors, such as data characteristics and experimental design.This underscores the importance of carefully and judiciously selecting data for training.

Limitations
One of the limitations of this study is that it relies on data collected from a single hospital in Abadan, Iran.The data may not be representative of the diversity of COVID -19 cases in different regions, and there may be differences in data quality and completeness.In addition, retrospectively collected data may have biases and inaccuracies.Although the study included a substantial number of COVID -19 patients, the sample size may still limit the generalizability of the results, especially for less common subgroups or certain demographic characteristics.

Future works
Future studies could adopt a multi-center approach to improve the scope and depth of research on COVID-19 outcomes.This could include working with multiple hospitals in different regions of Iran to ensure a more diverse and representative sample.By conducting prospective studies, researchers can collect data in real time, which reduces the biases associated with retrospective data collection and increases the reliability of the results.Increasing sample size, conducting longitudinal studies to track patient progression, and implementing quality assurance measures are critical to improving generalizability, understanding long-term effects, and ensuring data accuracy in future research efforts.Collectively, these strategies aim to address the limitations of individual studies and make an important contribution to a more comprehensive understanding of COVID-19 outcomes in different populations and settings.

Conclusions
In summary, this study demonstrates the potential of ML algorithms in predicting COVID-19 mortality based on a comprehensive set of features.In addition, the interpretability of the models using SHAP-based feature importance, which revealed the variables strongly correlated with mortality.This study highlights the power of datadriven approaches in addressing critical public health challenges such as the COVID-19 pandemic.The results suggest that the performance of ML models is influenced by the number and type of features in each feature set.These findings may be a valuable resource for health professionals to identify high-risk patients COVID-19 and allocate resources effectively.

Fig. 2
Fig. 2 Feature importance based on SHAP-values.The mean absolute SHAP values are depicted, to illustrate global feature importance.The SHAP values change in the spectrum from dark (higher) to light (lower) color

Fig. 3 Fig. 4
Fig. 3 The SHAP-based feature importance of all categories (I to VII) for COVID-19 mortality prediction, calculated with the XGBoost model.The local explanatory summary shows the direction of the relationship between a feature and patient outcome.Positive SHAP values indicate death, whereas negative SHAP values indicate survival.As the color scale shows, higher values are blue while lower values are orenge

Table 1
Baseline characteristics of patients infected with COVID-19 † P-value conducted from Independent t-test ‡ P-value conducted from Chi-

Table 2
Clinical and conditions characteristics of patients infected with COVID-19 CPR Cardiopulmonary Resuscitation † P-value conducted from Independent t-test ‡ P-value conducted from Chi-square testTreatmentThe treatment characteristics of the COVID-19 patients and the resulting outcomes are shown in

Table 4 .
The table shows the frequency of patients who received different types of medications or therapies during their treatment.According to the results, the use of antibiotics (35.1%), remdesivir (29.6%), favipiravir (36.0%), and Vita-

Table 6
provides information on the symptoms of patients infected with COVID-19 by survival outcome.The table also shows the frequency of symptoms among patients.
The most common symptom reported by patients was fever, which occurred in 67.0% of surviving and deceased patients.Dyspnea and nonproductive cough were the second and third most common symptoms, reported by

Table 3
Comorbidities characteristics of patients infected with COVID-19 HTN Hypertension, DM Diabetes mellitus, CVD Cardiovascular disease, CKD Chronic kidney disease, COPD Chronic obstructive pulmonary disease, HIV Human immunodeficiency virus, HBV Hepatitis B virus, Respiratory Such as influenza, pneumonia, asthma, bronchitis, and chronic obstructive airways disease, GI Gastrointestinal, Neurology Such as epilepsy, learning disabilities, neuromuscular disorders, autism, ADD, brain tumors, and cerebral palsy, Liver Such as fatty liver disease and cirrhosis, Hematology Blood disease, Dermatology Skin diseases, Psychology Mental disorders ‡ : P-value conducted from Chi-square test

Table 4
Treatment characteristics of patients infected with COVID-19 IVIg Intravenous immunoglobulin, NSAIDs Non-steroidal anti-Inflammatory drugs, ACEi Angiotensin converting enzyme inhibitors, ARB Angiotensin II receptor blockers, Zn Zinc ‡ : P-value conducted from Chi-square test

Table 5
Initial vital sign characteristics of patients infected with COVID-19 HR Heart rate, BPM Beats per minute, RR Respiratory rate, T Temperatures, SBP Systolic blood pressure, DBP Diastolic blood pressure, MAP Mean arterial pressure, SPO 2 Oxygen saturation, PaO 2 Partial pressure of oxygen in the alveoli, PEEP Positive end-expiratory pressure, FiO 2 Fraction of Inspired Oxygen, Pneumonia Radiography (X-ray) test result † P-value conducted from Independent t-test ‡ P-value conducted from Chi-square test

Table 6
Symptoms of patients infected with COVID-19Olfactory Smell Disorders, Dyspepsia Indigestion, LOC Level of consciousness, MODS Multiple organ dysfunction syndrome, Hemoptysis Coughing up blood, Coagulopathy Bleeding disorder, Hyperglycemia High blood glucose, ICU Intensive care unit † P-value conducted from Independent t-test ‡ P-value conducted from Chi-square test

Table 8
Performance comparison of ML models by feature sets in predicting mortality from COVID-19The average values are expressed from the test set as the Mean (SD) DT Decision Tree, XGBoost eXtreme Gradient Boosting, SVM Support Vector Machine, NB Naïve Bayes, RF Random Forest